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COMMUNICATION SERVICE SYSTEM 



PUB. NO. : 
PUBLISHED: 
INVENTOR (s) : 
APPLICANT (s) 
APPL. NO.: 
FILED: 
INTL CLASS: 



2002-007272 [JP 2002007272 A] 
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. ABSTRACT 

PROBLEM TO BE SOLVED: To provide a system which mediates in specifying and 
selecting of a person concerned by a bulletin board and contacting 
method, having area, real-time, anonymousness , and purpose properties 
between an unspecified person who wishes to make contact and an 
unspecified person to be contacted . 

SOLUTION: When a user terminal accesses a service server to write a 
message, the service server requests the user terminal to select a 

contact means of at least mail, a real-time chat, or a voice message, set 
the terms of the validity of the message, and selects a desired genre and 
also requests the terminal to send a message body; and the message text 
received from the user terminal is classified into the corresponding genre 
and recorded in a bulletin board database together with the term of 
validity and contacting means. When the user terminal gains access for 
message browsing , the user terminal is requested to select a desired 
genre, and all messages corresponding to the desired genre received from 
the user terminal are extracted from the bulletin board database to 

generate bulletin board information, which is displayed on the user 

terminal . 
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ABSTRACT 

PROBLEM TO BE SOLVED: To join in a multimedia electronic bulletin board 
by using the mechanism of an electronic mail system ( E - Mail ) when 
sending data to be posted onto a bulletin board and the mechanism of 
the multimedia electronic bulletin board system (WWW) when posting and 
browsing the data. 

SOLUTION: Electronic mail data are processed distinctively by two 

systems which are a generation system and a management system. An 
electronic mail data analysis part 4 for multimedia electronic 

bulletin board constitution of the generation system generates a 

file group in format which can be displayed through a WWW browser to 
generate a multimedia electronic bulletin board constitution file 
group . An electronic mail data analysis part 10 for multimedia 

electronic bulletin board management of the management system, on the 
other hand, operates files in multimedia electronic bulletin board 
constitution file group 14 according to the contents of the management 
command definition part 13 to manages the multimedia bulletin board . 
Thus, the multimedia electronic bulletin board which can easily be used 
in common to information origination is created . 
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XRPX Acc No: N00-412441 

Data updating method for user interactive electronic information 
providing system in Internet, involves generating virtual search 
objects relevant to user's interest and bulletin board is scanned to 
classify users 

Patent Assignee: HERZ F S M (HERZ-I) 

Inventor: HERZ F S M 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6029195 A 20000222 US 94346425 A 19941129 200051 B 

US 9632461 P 19961209 
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Patent No Kind Lan Pg Main IPC Filing Notes 

US 6029195 A 63 G06F-015/16 CIP of application US 94346425 

Provisional application US 9632461 
CIP of patent US 5758257 

Abstract (Basic) : US 6029195 A 

NOVELTY - Target profiles are generated relevant to contents of 
target bulletin boards . The user preferred data is retrieved for 
each user, using the profiles. Virtual search objects relevant to 
user are generated . Each bulletin board is scanned relevant to 
target object and user groups are classified depending on their 
interests . 

DETAILED DESCRIPTION - The user groups having common interest of 
particular object data is identified. Then, the identified user is 
matched with the other users for creating a new bulletin board . 
The matched user group is generated as E - mail list and the list 
is forwarded to the concerned user. The new users relevant to the new 
bulletin board are added in the user's list. 

USE - For user interactive electronic information providing system 
in Internet used in providing news, advertisements and various data. 
Also used in TV broadcasting, advertisement research and for on-line 
video conferencing used for business, schools and job training 
purposes . 

ADVANTAGE - Facilitates accessing of desired data with less 
accessing time, by modifying the electronic bulletin boards 
periodically. Eases editing of documents in online conferencing, 
thereby promotes product design and operativity. 

DESCRIPTION OF DRAWING { S ) - The figure shows the flow chart 
representing the user interactive data accessing method. 
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Automated web site creation and access system has web access module for 
generating and presenting listings of all community of practice websites, 
in which access is provided to community of practice websites upon 
selection from listings 

Patent Assignee: QWEST COMMUNICATIONS INT INC (QWES-N) 

Inventor: KENYON J D 

Number of Countries: 001 Number of Patents: 001 
Patent Family: 

Patent No Kind Date Applicat No Kind Date Week 

US 6701343 Bl 20040302 US 99452526 A 19991201 200424 B 

Priority Applications (No Type Date) : US 99452526 A 19991201 
Patent Details : 

Patent No Kind Lan Pg Main IPC Filing Notes 
US 6701343 Bl 16 G06F-015/16 
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Abstract (Basic) : US 6701343 Bl 

NOVELTY - The system has a' web access module for generating and 
presenting one' or more listings of all the community of practice 
websites including status information for each of the community of 
practice websites. Access is provided to one or more of the community 
of practice websites upon selection from the listings. 

DETAILED DESCRIPTION - A website generator enters the custom 
information into the templates included in a database to generate the 
community of practice websites, and stores the community of practice 
websites in another database. Each of the community websites is of 
standardized format including several automatically linked web pages. 
An interface, accessible over a data network, is configured to receive 
custom information for creating the community of practice websites. 
INDEPENDENT CLAIMS are included for the following: 

(a) Community^ of practice server; and 

(b) Creating and providing access to a community web site on 
a web-based server. 

USE - For simplified generation of community practice web 
pages which are accessible and updateable by -a number of parties. 

ADVANTAGE - Enables system user to enter minimal amount of 
information to create a website which is accessible and updateable by 
other members of the community of practice. Constructs a website in 
such a manner that after a predetermined period of non-use, it may be 
placed in an archive until revived at a selected point in time. 
Provides a system that is connectable to the Internet, an intranet or 
extranet to provide functions such as creating, viewing or updating of 
websites. Configures a newly created website to provide for unlimited 
access over the network or security features may be employed to limit 
access. 

DESCRIPTION OF DRAWING (S) - The figure shows the system diagram for 
the community of practice server, 
pp; 16 DwgNo 1/10 
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Method for automatically creating community in internet community 

service 

Patent Assignee: HAHMO.COM CO LTD (HAHM-N); LOCUS DIGITAL SERVICE JH 
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Inventor: YOON J H 

Number of Countries: 001 Number of Patents: 002 
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KR 200022100 A 20000426 
Filing Notes 
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Abstract (Basic) : KR 2001097753 A 

NOVELTY - A method for automatically creating a community in an 
Internet community service is provided to increase a 
competitiveness compared with other site providing a uniform Internet 
community service by progressing very rapidly a community creation in 
contrast to an existing process. 

DETAILED DESCRIPTION - A database of a web server stores a set 
community grouping and a detail item by each community 

grouping (S310, S320) . In case that a user connecting through an Internet 
requests an entrance to the community service and an renewal of 
registered contents, the web server receives user information from the 
user (S330, S340) . The web server receiving the user information 
automatically registers it by the detail item about each community 
grouping (S350 to S380). 
pp; 1 DwgNo 1/10 
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Collaboration backbone for web-based collaborative systems, downloads 
demon logic at clients and forms collaborative sessions of interactive 
applications , based on system state maintained by server 

Patent Assignee: UN IV SYRACUSE (UYSY-N) 

Inventor: BECA L; CHENG G; FOX G C; JURGA T; OLSZEWSKI K; PODGORNY M; 

SOKOLOWSKI P; WALCZAK K 
Number of Countries: 001 Number of Patents: 001 
Patent Family: 
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Priority Applications (No Type Date) : US 9817840 A 19980203 
Patent Details: 
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Abstract (Basic): US 6078948 A 

NOVELTY - The client (210) includes demon (220) embedded in room 
page, to identify server (240) for receiving and forwarding messages 
routed to relevant entity based on message information. The server 
maintains system state including associations identifying demons, based 
on which demon logic is downloaded at respective clients and 
collaborative session of interacting instances of applications 
(230,235) is formed. 

DETAILED DESCRIPTION - The demon identifies, downloads and launches 
control logic (225) associated with room page and establishes a 
communication path between downloaded demon and downloaded control 
logic. The demon also identifies and launches an application and 
establishes a communication path between the downloaded demon and the 
launched application . An INDEPENDENT CLAIM is also included for the 
method of forming collaborative session of interacting instances of 
application in virtual mode. 

USE - For forming web -based virtual communities having 
virtual rooms with collaborative sessions e.g. for chat rooms, shared 
white boards, etc. 

ADVANTAGE - Provides a powerful vehicle ,for distributing 
collaborative applications . Provides a flexible design for the 
development of new collaborative applications and for porting of old 
applications into collaborative versions. Allows maximum flexibility 
of floor control and session management. 

DESCRIPTION OF DRAWING (S) - The figure shows the system 
architectural diagram of collaboration backbone. 

Client (210) 

Demon (220) 

Control logic (225) 
Applications (230, 235) 

Server (240) 

pp; 29 DwgNo 2/14 

Title Terms: BACKBONE/ WEB; BASED; SYSTEM; LOGIC; CLIENT; FORM; SESSION; 

INTERACT; APPLY; BASED; SYSTEM; STATE; MAINTAIN; SERVE 
Derwent Class: T01 

International Patent Class (Main) : G06F-015/163 
File Segment: EPI 



28/5/13 (Item 13 from file: 347) 

DIALOG (R) File 347:JAPIO 

(c) 2005 JPO & JAPIO. All rts. reserv. 



06171005 **Image available** 

METHOD AND SYSTEM FOR ASSISTING GENERATION AND ACTIVITY OF ELECTRONIC 
COMMUNITY SUPPORTING AND STORAGE MEDIUM STORING ASSISTING PROGRAM FOR 
GENERATION AND ACTIVITY OF ELECTRONIC COMMUNITY 



PUB. NO. : 11-112552 [JP 11112552 A] 

PUBLISHED: April 23, 1999 (19990423) 
INVENTOR(s) : MIZOGUCHI YOICHI 
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MAGOORI AKIHIRO 
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INTL CLASS: H04L-012/54 ; H04L-012/58 ; G06F-013/00 

ABSTRACT 



PROBLEM TO BE SOLVED: To considerably simplify the procedure of a manager 
and to considerably reduce time required for work by associatively 
displaying log information on the transmission/reception of an electric 
mail, an electronic mail to be responded and a response electronic mail in 
a tree form on a home page. 

SOLUTION: A mailing list generation means 220 generates a mailing list 
based on inputted mailing list generation information. A home page 
generation means 240 generates the community home page based on home page 
generation information for displaying the log of the mailing list. A log 
recording means 280 records the log on the electronic mail which a user 
transmits/receives in accordance with the mailing list. A home page 
transmission means 202 transmits and displays a relation between the 
recorded log and the transmitted/received electronic mail to a user 
terminal 100 by adding it to the home page generated by compiling it in the 
tree form. 
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ABSTRACT 

PROBLEM TO BE SOLVED: To provide an electronic bulletin board system in 
which even a personal computer or the like as required refers to a content 
of bulletin board information displayed on an electronic bulletin board 
using a large sized display device and information server confirms the 
content of the bulletin board information on an optional date in the past 
or in the future. 



SOLUTION: A bulletin board information generator 100 is provided with 
a bulletin board information storage means 111a to add attribute 

information to the bulletin board information and stores the result and 
a conversion means 113b that classifies the stored bulletin board based on 
the attribute information and converts periodically the classified 
bulletin board information into at least two forms of files. The 

electronic bulletin board system 200 is provided with a selection means 
211a that selects desired bulletin board information from a plurality of 
the converted files periodically or according to a request of a user, a 
transfer means 211b that transfers the selected bulletin board information 
to the electronic bulletin board 200 and a display means 212a that displays 
the transferred bulletin board information on an input output terminal 230. 
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Abstract: This paper describes the newest form of micro product 
support, the online support service, which offers so many advantages that 
it is well on its way to supplanting all the others. This support method is 
a variation on the online user group that has been popular on the 
consumer databanks for close to a decade. Online user groups , also 
called Special Interest Groups, SIG's, or Forums , combine electronic 
mail , conferencing, uploading, downloading, and online searching into an 
efficient and practical means of information exchange. When applied to 
product support, they permit prompt, authoritative, and personalized 
responses to almost any question. Although there are many individual 
variations, online support services maintain the two principal elements of 
the online user group : a bulletin board for current messages, and 
an archive containing a variety of programs and text files . At its 
simplest, product users send questions by electronic mail to the 
bulletin board , where company technicians in turn post answers. 9 refs. 
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The Internet creates a unique opportunity for building virtual 
learning communities . This field study investigated the experiences of 12 
adults engaged in a computer-mediated education program using an 
asynchronous online conference. To reveal what fostered meaningful 
discourse and transformative learning, the study used an ethnographic 
participant-observation approach supported by interpretation of online 
transcripts, fieldnotes, a focus group discussion, questionnaires, and 
phone interviews. Participants explored aspects of their psychological and 
spiritual development, sharing their life stories through creative writing 
and imagery, online and in person for one year. Personal storytelling and 
virtual group discourse revealed examples of transpersonal experiences, in 
which the participant's sense of self-identity extended beyond (trans) the 
individual or personal to encompass wider aspects of relatedness to others, 
the natural world, or the cosmos. Participants reported the importance of 
pace and flow in online discourse as well as a sense of immersive presence. 
Sustained online discourse was found to be crucial in observing 
participatory thought and creating a supportive structure for collaborative 
learning. Seven key elements that fostered transformative learning are: 
{ 1 )   Combine face-to-face meetings with virtual presence; 
(2 )   Establish the container with attention to community size, 
structure, tone, and intention; ( 3 )   Structure the community to be 
self-creating, self -maintaining, and self-defining through flexible 
curriculum design and whole-group learning; ( 4 )   Encourage the 
development of in-the-moment self awareness, mindfulness, and immersive 
presence; ( 5 )   Guide risk-taking through shared feelings, life 
experiences, and reclaimed projections; ( 6 )   Welcome humor, 
improvisation, and creative expression; (7 )   Share the search for 
meaning: See all of life and education as a transformational journey. 
Detailed descriptions of program development, structure, facilitation, 
and curriculum are offered that could be applied to a range of different 
lifelong learning settings. 
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Abstract: Matching the information needs of Internet users with the 
content on the Web demands better modeling of the needs of the users. 

Searching the explosive content on the Internet merely with keywords is 
not a smart solution. Keyword-based searches and other traditional 
methods will soon give way to efficient information-foraging tools . Even 
the largest search engines cannot keep pace with the scaling up of the 
Web. Moreover, the "one-size-f its-all search " must yield to user-adaptive 

searches , which learn from the past behavior of users and communities. 
This paper introduces the fascinating technologies that are making their 
way into hypertext information management products. We review research 
prototypes and upcoming products and services in this space. Can useful but 
hidden information on the Internet, like topic -specific communities and 

e - groups with specific interests be mined out of the plethora of 
pages and links on the Web, thus forming a "vertical portal"? Can the 
profile of the user and his interest areas be used to address his 

queries better than with keywords alone? Recent research seems to suggest 
that the structure and content of the Web may permit performing the above 
tasks mostly automatically. {9 Refs) 
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Abstract: With the continuing explosive growth of the world wide web and 
the Internet, there are many opportunities to use these technologies to 
enhance our ability to function in an engineering and a management 
capacity. Most of the emphasis relates to the world wide web and 

developing websites based on the latest and greatest programming 
. languages and tools . There is also a great interest in information 
processing, centered around using search engines and push technologies. 
Although e - mail has been part of the Internet experience since the very 
early days, it has evolved into an effective communications tool . 
However, there are still opportunities to use the Internet to become more 
effective in communication, project management, general management, and 
content-specific information exchange. There are companies that provide 
services to the on - line community that allow for the interaction of 
two or more people in real-time in a number of fashions. Some of the 
possible actions include: on-line and off-line messaging, multi-user 
chatting, real-time file and URL transfer, notification of other users 
currently on-line, Internet phone facilitation, and message history 
logging. Some of these services are free to the user, while others charge 
for the technology. The discussion of one program 's capabilities 

generates ideas for how the technology can make engineers and managers 
more effective. (0 Refs) 
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Abstract: This paper explores the idea of harnessing computer networks to 
overcome the knowledge acquisition bottleneck. We introduce the idea of a 
CYLINA (cyberspace leveraged intelligent agent) -an intelligent system that 
gains knowledge/information through interactions with a large population of 
network users. CYLINAs rely on small, incremental contributions from a 
large population of knowledge experts. We consider potential applications 
for CYLINAs, then focus on Auto-FAQ, an experimental system currently under 
development at GTE Laboratories. Auto-FAQ is a question-answering system. 
Its intent is to make information typically found in USENET News FAQs much 
more accessible. It has many other uses as well. Users ask questions in 
natural language forms . These questions index directly into the systems 
infobase. Infobase entries are question-answer pairs. Answers can be raw 
text, URLs, or links into existing entries in the system* s infobase. By 
using the system recursively, users can explore entire subjects with a 
series of questions. Facilities exist to tag gaps in the systems knowledge 
base. When a gap is found, it is posted to a public list. Individuals in 
the cyberspace community can search the list, volunteer expertise, 

and fill in gaps as appropriate. A version of Auto-FAQ is currently 
operating on a private network at GTE Laboratories. The system is currently 
able to answer basic questions about itself, WWW, and Mosaic. Future plans 
are to make Auto-FAQ and its associated software available on the global 
Internet. (0 Refs) 
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Abstract: An electronic community system encodes and manipulates the 
range of knowledge and values necessary to function effectively in a 
community or organization. The knowledge includes both formal data and 
literature and informal results and news. The manipulation includes both 
browsing through the available knowledge, and recording and sharing 
interrelationships between the items. A large-scale experiment is underway 
to build an electronic community system for the community of 

scientists studying the nematode worm C. elegans, a model organism in 
molecular biology. This paper discusses a model for community systems and 
previous such systems in science, the biology experiment and a previous 
system, the enabling technology for handling the knowledge, the enabling 
mechanisms for handling the values, the state of the prototype, and 
speculations on future applications in supporting organizational memory. 
(17 Refs) 
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Abstract: This article describes research in the application of a Kohonen 
Self-Organizing Map (SOM) to the problem of classification of 
electronic brainstorming output and an evaluation of the results. 
Electronic brainstorming is one of the most productive tools in the 
Electronic Meeting System called GroupSystems . A major step in group 
problem solving involves the -classification of electronic brainstorming 
output into a manageable list of concepts, topics , or issues that can 
be further evaluated by the group. This step is problematic due to 
information overload and the cognitive demand of processing alarge 
quantity of textual data. This research builds upon previous work 
inautomating the meeting classification process using a Hopfield neural 
network. Evaluation of the Kohonen output comparing it with Hopfield 
and human expert output using the same set of data found that the 
Kohonen SOM performed as well as a human expert in representing term 
association in the meeting output and outperformed the Hopfield neural 
network algorithm. In addition, recall of consensus meeting .concepts 
and topics using the Kohonen algorithm was equivalent to that of the 
human expert. However, precision of the Kohonen results was poor, The 
graphical representation of textual data produced by the Kohonen SOM 
suggests many opportunities for improving information organization of 
textual information. Increasing uses of electronic mail , 
computer-based bulletin board systems, and world-wide web services 
present unique challenges and opportunities for a system-aided 
classification approach. This research has shown that the Kohonen SOM 
may be used to automatically create ft a picture that can represent a 
thousand (or more) words. ' 1 
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The Web is a hypertext body of approximately 
300 million pages that continues to grow at 
roughly a million pages per day. Page varia- 
tion is more prodigious than the data's raw 
scale: Taken as a whole, the set of Web pages 
lacks a unifying structure and shows far more author- 
ing style and content variation than that seen in tra- 
ditional text-document collections. This level of 
complexity makes an "off-the-shelf" database-man- 
agement and information-retrieval solution impossi- 
ble. 

To date, index-based search engines for the Web 
have been the primary tool by which users search for 
information. The largest such search engines exploit 
technology's ability to store and index much of the 
Web. Such engines can therefore build giant indices 
that let you quickly retrieve the set of all Web pages 
containing a given word or string. 

Experienced users can make effective use of such 
engines for tasks that can be solved by searching for 
tightly constrained keywords and phrases. These 
search engines are, however, unsuited for a wide range 
of equally important tasks. In particular, a topic of any 
breadth will typically contain several thousand or mil- 
lion relevant Web pages. Yet a user will be willing, typ- 
ically, to look at only a few of these pages. 

How then, from this sea of pages, should a search 
engine select the correct ones — those of most value to 
the user? 

AUTHORITATIVE WEB PAGES 

First, to distill a large Web search topic to a size that 
makes sense to a human user, we need a means of iden- 
tifying the topic's most definitive or authoritative Web 
pages. The notion of authority adds a crucial second 
dimension to the concept of relevance: We wish to 
locate not only a set of relevant pages, but also those 
relevant pages of the highest quality. 

Second, the Web consists not only of pages, but 
hyperlinks that connect one page to another. This 
hyperlink structure contains an enormous amount of 



latent human annotation that can help automatically 
infer notions of authority. Specifically, the creation of 
a hyperlink by the author of a Web page represents 
an implicit endorsement of the page being pointed to; 
by mining the collective judgment contained in the set 
of such endorsements, we can gain a richer under- 
standing of the relevance and quality of the Webs con- 
tents. 

To address both these parameters, we began devel- 
opment of the Clever system 1 ' 3 three years ago. Clever 
is a search engine that analyzes hyperlinks to uncover 
two types of pages: 

• authorities, which provide the best source of 
information on a given topic: and 

• hubs, which provide collections of links to 
authorities. 

In this article, we outline the thinking that went into 
Clever 's design, report briefly on a study that com- 
pared Clever's performance to that of Yahoo and 
AltaVista, and examine how our system is being 
extended and updated. 

FINDING AUTHORITIES 

You could use the Web's link structure in any of sev- 
eral ways to infer notions of authority — some much 
more effective than others. Because the link structure 
implies an underlying social structure in the way that 
pages and links are created, an understanding of this 
social organization can provide us with the most lever- 
age. Our goal in designing algorithms for mining link 
information is to develop techniques that take advan- 
tage of what we observe about the Web's intrinsic 
social organization. 

Search obstacles 

As we consider the types of pages we hope to dis- 
cover, and to do so automatically, we quickly confront 
some difficult problems. First, it is insufficient to apply 
purely text-based methods to collect many potentially 
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relevant pages, and then comb this set for the most 
authoritative ones. For example, were we to look for 
the Web s main search engines, we would err badly if 
we searched only for "search engines," Although the 
set of pages containing this term is enormous, it does 
not contain most of the natural authorities we would 
expect to find, such as Yahoo, Excite, InfoSeek, and 
AltaVista. Similarly, we cannot expect Honda's or 
Toyota's home pages to contain the words "Japanese 
automobile manufacturers," nor that Microsoft's or 
Lotus' home pages will contain the words "software 
companies. " Authorities are seldom particularly self- 
descriptive. Large corporations design their Web pages 
carefully to convey a certain feel and project the cor- 
rect image— a goal that might differ significantly from 
that of actually describing the company. Thus, people 
outside a company frequently create more recogniz- 
able and sometimes better judgments about it than 
does the company itself. 

Working with hyperlink information causes diffi- 
culties as well. Although many links represent the type 
of endorsement we seek — for example, a software 
engineer whose home page links to Microsoft and 
Lotus— others are created for reasons that have noth- 
ing to do with conferring authority Some links exist 
purely for navigational purposes: " Click here to return 
to the main menu." Others serve as paid advertise- 
ments: "The vacation of your dreams is only a click 
away." We hope, however, that in an aggregate sense, 
over a large enough number of links, our view of links 
as conferring authority will hold. 

Modeling authority conferral 

How can we best model the way in which author- 
ity is conferred on the Web? Clearly, when commer- 
cial or competitive interests are at stake, most or- 
ganizations will perceive no benefit from linking 
directly to one another. For example, AltaVista, 
Excite, and InfoSeek may all be authorities for the 
topic "search engines, " but will be unlikely to endorse 
one another directly. 

If the major search engines do not explicitiy describe 
themselves as such, how can we determine that they 
are indeed the most authoritative pages for this topic? 
We could say that they are authorities because many 
relatively anonymous pages, clearly relevant to 
"search engines," link to AltaVista, Excite, and 
Infoseek. Such pages are a recurring Web component: 
hubs that link to a collection of prominent sites on a 
common topic. Hub pages appear in a variety of 
forms, ranging from professionally assembled re- 
source lists on commercial sites to lists of recom- 
mended links on individual home pages. These pages 
need not be prominent themselves, or even have any 
links pointing to them. Their distinguishing feature is 
that they are potent conferrers of authority on a 



focused topic. In this way, they actually form a 
symbiotic relationship with authorities: A good 
authority is a page pointed to by many good 
hubs, while a good hub is a page that points to 
many good authorities. 3 

This mutually reinforcing relationship be- 
tween hubs and authorities serves as the central 
theme in our exploration of link-based meth- 
ods for search, the automated compilation of 
high-quality Web resources, and the discovery 
of thematically cohesive Web communities. 
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HITS: COMPUTING HUBS AND AUTHORITIES 

The HITS (Hyperlink-Induced Topic Search) algo- 
rithm 3 computes lists of hubs and authorities for Web 
search topics. Beginning with a search topic, specified by 
one or more query terms, the HITS algorithm applies 
two main steps: 

• a sampling component, which constructs a 
focused collection of several thousand Web pages 
likely to be rich in relevant authorities; and 

• a weight-propagation component, which deter- 
mines numerical estimates of hub and authority 
weights by an iterative procedure. 

HITS returns as hubs and authorities for the search 
topic those pages with the highest weights. 

We view the Web as a directed graph, consisting of 
a set of nodes with directed edges between certain 
node pairs. Given any subset S of nodes, the nodes 
induce a subgraph containing all edges that connect 
two nodes in 5. The HITS algorithm starts by con- 
structing the subgraph in which we will search for 
hubs and authorities. Our goal is to have a subgraph 
rich in relevant, authoritative pages. 

To construct such a subgraph, we first use the query 
terms to collect a root set of pages— say, 200 — from 
an index-based search engine. We do not expect that 
this set necessarily contains authoritative pages. 
However, since many of these pages are presumably 
relevant to the search topic, we expect at least some 
of them to have links to most of the prominent author- 
ities. We therefore expand the root set into a base set 
by including all the pages that the root-set pages link 
to, and all pages that link to a page in the root set, up 
to a designated size cutoff. 

This approach follows our intuition that the promi- 
nence of authoritative pages derives typically from the 
endorsements of many relevant pages that are not, in 
themselves, prominent. We restrict our attention to 
this base set for the remainder of the algorithm. We 
find that this set typically contains from 1,000 to 
5,000 pages, and that hidden among these are many 
pages that, subjectively, can be viewed as authoritative 
for the search topic. 
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Our techniques 
for uncovering 
authorities and hubs 
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communities, 
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a human-assisted 
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may overlook. 



We work with the subgraph induced by the 
base set, with one modification. We find that 
links between two pages with the same Web 
domain frequently serve a purely navigational 
function, and thus do not confer authority. By 
"Web domain," we mean simply the first level 
in the URL string associated with a page. We 
therefore delete all links between pages with 
the same domain from the subgraph induced 
by the base set, and then apply the remainder 
of the algorithm to this modified subgraph. 

We extract good hubs and authorities from 
the base set by giving a concrete numerical 
interpretation to our intuitive notions of 
authorities and hubs. We associate a nonnega- 
tive authority weight x p and a nonnegative hub 
weight y p with each page peV. We are inter- 
ested in the relative values of these weights only, not 
their actual magnitudes. In our manipulation of the 
weights, we apply a normalization so that their total 
sum remains bounded. The actual choice of normal- 
ization does not affect the results — we maintain the 
invariant that the squares of all weights sum to 1 . A 
page p with a large weight x p will be viewed as a " bet- 
ter" authority, while a page with a large weight y p will 
be viewed as a "better" hub. Since we do not impose 
any a priori estimates, we set all x and y values to a 
uniform constant initially; we will see later, however, 
that the final results are essentially unaffected by this 
initialization. 

We now update the authority and hub weights as 
follows. If a page is pointed to by many good hubs, 
we would like to increase its authority weight. Thus 
we update the value of x p , for a page p, to be the sum 
of y q over all pages q that link to p: 



(i) 



qsuch that q-*p 



where the notation q — » p indicates that q links to p. 
In a strictly dual fashion, if a page points to many good 
authorities, we increase its hub weight via 



(2) 



qsuch that p-»q 



There is a more compact way to write these updates, 
and it sheds more light on what occurs mathemati- 
cally. Let us number the pages {1,2 n} and define 

their adjacency matrix A to be the n x n matrix whose 
(ij) lh entry is equal to 1 if page / links to page j, and is 
0 otherwise. Let us also write the set of all x values as 

a vector x= (x,, x 2 x r ), and similarly define y= (y |t 

y z yj. Then our update rule for x can be written 

as x <— A T y and our update rule for y can be written 
asy Ay. Unwinding these one step further, we have 



x <r- A T y <- A T Ax = (A T A)x 
and 

y<r- Ax<- AA T y = (AA T )y. 



(3) 



(4) 



Thus, the vector x after multiple iterations is precisely 
the result of applying the power iteration technique to 
A r A: We multiply our initial iterate by larger and larger 
powers of A T A. Linear algebra tells us that this sequence 
of iterates, when normalized, converges to the principal 
eigenvector of A T A. Similarly, we discover that the 
sequence of values for the normalized vector y con- 
verges to the principal eigenvector of AA T , Gene Golub 
and Charles Van Loan 4 describe this relationship 
between eigenvectors and power iteration in detail. 

Power iteration will converge to the principal eigen- 
vector for any nondegenerate choice of initial vector — 
in our case, for example, for any vector whose entries 
are all positive. This says that the hub and authority 
weights we compute are truly an intrinsic feature of 
the linked pages collected, not an artifact of our choice 
of initial weights or the tuning of arbitrary parame- 
ters. Intuitively, the pages with large weights represent 
a very dense pattern of linkage, from pages of large 
hub weight to pages of large authority weight. 

Finally, HITS outputs a short list consisting of the 
pages with the largest hub weights and the pages with 
the largest authority weights for the given search 
topic. Once the root set has been assembled, HITS is 
a purely link-based computation with no further 
regard to the query terms. Nevertheless, HITS pro- 
vides surprisingly good search results for a wide range 
of queries. For example, when tested on the sample 
query "search engines," HITS returned the top 
authorities — Yahoo, Excite, Magellan, Lycos, and 
AltaVista — even though none of these pages con- 
tained the phrase "search engines" at the time of the 
experiment. Results such as this confirm our intuition 
that in many cases the use of hyperlinks can help cir- 
cumvent some of the difficulties inherent in purely 
text-based search methods. 

Our techniques for uncovering authorities and hubs 
provide a further benefit. As the "Trawling the Web 
for Emerging Cybercommunities" sidebar shows, our 
algorithms can uncover Web communities, defined by 
a specific interest, that even a human-assisted search 
engine like Yahoo may overlook. 

COMBINING CONTENT WITH LINK INFORMATION 

Although relying extensively on links when search- 
ing for authoritative pages offers several advantages, 
ignoring textual content after assembling the root set 
can lead to difficulties. These difficulties arise from 
certain features of the Web that deviate from the pure 
hub-authority view: 
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• On narrowly focused topics, HITS frequently 
returns good resources for a more general topic. For 
instance, the Web does not contain many resources 
for skiing in Nebraska; a query on this topic will typ- 
ically generalize to Nebraska tourist information. 

• Since all the links out of a hub page propagate 
the same weight, HITS sometimes drifts when 
hubs discuss multiple topics. For instance, a 
chemist's home page may contain good links not 
only to chemistry resources, but also to resources 
for her hobbies and regional information for her 
hometown. In such cases, HITS will confer some 
of the "chemistry" authority onto authorities for 
her hobbies and town, deeming these authorita- 
tive pages for chemistry. 

• Frequently, many pages from a single Web site will 
take over a topic simply because several of the pages 
occur in the base set. Moreover, pages from the 
same site often use the same HTML design tem- 



plate, so that in addition to the information they 
give on the query topic, they may all point to a sin- 
gle popular site that has little to do with the query 
topic. This inadvertent topic hijacking can give a 
site too large a share of the authority weight for the 
topic, regardless of the site's relevance. 

System heuristics 

The Clever system addresses these issues by replac- 
ing the sums of Equations 1 and 2 with weighted 
sums, assigning to each link a nonnegative weight. 
The weight assigned depends in several ways on the 
query terms and the endpoints of the link. Together 
with some additional heuristics, weighting helps mit- 
igate HITS' limitations. 

The text that surrounds hyperlink definitions (hrefs) 
in Web pages is often referred to as anchor text. In 
our setting, we choose to use anchor text to weight 
the links along which authority is propagated. A typ- 



Trawling the Web for 
Emerging Cybercommunities 

The Web harbors many communities — 
groups of content creators who share a 
common interest that manifests itself as a 
set of Web pages. Though many commu- 
nities are defined explicitly — newsgroups, 
resource collections in portals, and so on — 
many more are implicit. Using a subgraph- 
enumeration technique called trawling, we 
discovered fine-grained communities num- 
bering in the hundreds of thousands — 
many more than the number of portals and 
newsgroup topics. The following commu- 
nities are a sampling of those we have 
extracted from the Web: 

• people interested in Hekiru Shiina, a 
Japanese pop singer; 

• people who maintain information 
about fire brigades in Australia; and 

• people belonging to Turkish student 
organizations in the US. 

Identifying these communities helps us 
understand the intellectual and sociologi- 
cal evolution of the Web. It also helps pro- 
vide detailed information to groups of 
people with certain focused interests. 
Owing to these communities* astronomical 
number, embryonic nature, and evolu- 
tionary flux, they are hard to track and find 
through sheer manual effort. Thus, when 



uncovering communities, we treat the Web 
as a huge directed graph, use graph struc- 
tures derived from the basic hub-author- 
ity-linkage pattern as a community's 
"signature," and systematically scan the 
Web graph to locate such structures. 

We begin with the assumption that the- 
matically cohesive Web communities con- 
tain at their core a dense pattern of linkage 
from hubs to authorities. The pattern ties 
the pages together in the link structure, 
even though hubs do not necessarily fink 
to hubs, and authorities do not necessarily 
link to authorities. We hypothesize that this 
pattern is a characteristic of both well- 
established and emergent communities. To 
frame this approach in more graph-theo- 
retic language, we use the notion of a 
directed bipartite graph — one whose nodes 
can be partitioned into two sets A and B 
such that every link in the graph is directed 
from a node in A to a node in B. Since the 
communities we seek contain directed 
bipartite graphs with a large density of 
edges, we expect many of them to contain 
smaller bipartite subgraphs that are in fact 
complete: Each node in A has a link to each 
node in B. 

Using a variety of pruning algorithms, 1 
we can enumerate all such complete bipar- 
tite subgraphs on the Web using only a 
standard desktop PC and about three days 
of runtime. In our experiments to date, we 



have used an 18-month-old crawl of the 
Web provided by Alexa (www.alexa.com), 
a company that archives Web snapshots. 
The process yielded about 130,000 com- 
plete bipartite graphs in which three Web 
pages all pointed to the same set of three 
other Web pages. 

Were these linkage patterns coinciden- 
tal? Manual inspection of a random sam- 
ple of about 400 communities suggests 
otherwise: Fewer than five percent of the 
communities we discovered lacked an 
apparent unifying topic. These bipartite 
cliques could then be fed to our HITS algo- 
rithms. These algorithms "expanded" the 
cliques to many more Web pages from the 
same community. 

Moreover, Yahoo does not list about 25 
percent of these communities, even today. 
Of those that do appear, many are not 
listed until the sixth level of the Yahoo 
topic tree. These observations lead us to 
believe that trawling a current copy of the 
Web will result in the discovery of many 
more communities that will become explic- 
itly recognized in the future. 
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ical example shows why we do so: When we 
seek authoritative pages on chemistry, we might 
reasonably expect to find the term "chemistry" 
in the vicinity of the tails — or anchors — of the 
links pointing to authoritative chemistry pages. 
To this end, we boost the weights of links in 
whose anchor — a fixed-width window — query 
terms occur. 

We base a second heuristic on breaking large 
hub pages into smaller units. On a page con- 
taining many links, it is likely that not all links 
focus on a single topic. In such situations it 
becomes advantageous to treat contiguous link subsets 
as minihubs, or pagelets; we can then develop a hub 
score for each pagelet, down to the level of single links. 
We hypothesize that contiguous sets of links on a hub 
page focus more tightly on a single topic than does the 
entire page. For instance, a page may be a good hub for 
the general topic of "cars, " but different portions of it 
may cater to the topics of "vintage cars" and 'solar- 
powered cars. " 

We apply one further set of modifications to HITS. 
Recall that HITS deletes all links between two pages 
within the same Web domain. Because we work with 
weighted links, we can address this issue through our 
choice of weights. First, we give links within a com- 
mon domain low weight, following the rationale that 
authority should generally be conferred globally rather 
than from a local source on the same domain. Second, 
when many pages from a single domain participate as 
hubs, we scale down their weights to prevent a single 
site from becoming dominant. 

All these heuristics can be implemented with mini- 
mal effort and without significantly altering the math- 
ematics of Equations 1 through 4. The sums become 
weighted sums, and matrix A now has nonnegative 
real- valued entries rather than just Os and Is. As before, 
the hub and authority scores converge to the compo- 
nents of the principal eigenvectors of AA T and A T A, 
respectively. In our experience, the relative values of the 
large components in these vectors typically resolve 
themselves after about five power iterations, obviating 
the need for more sophisticated eigenvector computa- 
tion methods. 

COMPARING CLEVER WITH 
OTHER SEARCH ENGINES 

How do the resources computed by Clever compare 
with those found by other methods? We have con- 
ducted several user studies that compare Clever s com- 
pilations with those generated by AltaVista (www. 
altavista.com), a term-index engine, and by Yahoo 
(www. yahoo.com) , a manually compiled topic taxon- 
omy in which a team of human ontologists create 
resource lists. 

In one such study, 2 which compares Clever with 



Yahoo and AltaVista, we began with a list of 26 broad 
search topics. For each topic, we took the top 10 pages 
from AltaVista, the top five hubs and five authorities 
returned by Clever, and a random set of 10 pages from 
Yahoo's most relevant node or nodes. We then inter- 
leaved these three sets into a single topic list, masking 
which method produced which page. Next, we assem- 
bled 37 users, who were required to be familiar with 
using Web browsers but who were not experts in com- 
puter science or in the 26 search topics. We then asked 
the users to visit pages from the topic lists and rank 
them as "bad," "fair," "good," or "fantastic," in 
terms of the pages' utility in providing information 
about the topic. This yielded 1,369 responses in all, 
which were then used to assess the relative quality of 
Clever. Yahoo, and AltaVista on each topic. AltaVista 
failed to receive the highest evaluation for any of the 
26 topics. For the other search engines, we obtained 
the following results: 

• For 3 1 percent of the topics, Yahoo and Clever 
received evaluations equivalent to each other 
within a threshold of statistical significance; 

• for 50 percent, Clever received a higher evalua- 
tion; and 

• for the remaining 19 percent, Yahoo received the 
higher evaluation. 

In masking the source from which each page was 
drawn, this experiment denied Yahoo one clear advan- 
tage of a manually compiled topic list: the editorial 
annotations and one-line summaries that give power- 
ful cues for deciding which link to follow. We did this 
deliberately because we sought to isolate and study 
the power of different paradigms for resource finding, 
rather than for the combined task of compilation and 
presentation. In an earlier study 1 we did not mask 
these annotations, and Yahoo's combination of links 
and presentation beat an early version of Clever. 

CONSTRUCTING TAXONOMIES 
SEMIAUTOMATICALLY 

Yahoo's large taxonomy of topics consists of a sub- 
ject tree, each node of which corresponds to a par- 
ticular topic and which is populated by relevant 
pages. Our study results suggest that Clever can be 
used to compile such large topic taxonomies auto- 
matically. 

Suppose we are given a tree of topics designed by 
domain experts. The tree can be specified by its topol- 
ogy and the labels on its nodes. We wish to populate 
each node of the tree with a collection of the best hubs 
and authorities. The following paradigm emerges: If 
we can effectively describe each node of the tree as a 
query to Clever, the Clever engine could then popu- 
late the node as often as we please. For instance, the 
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Assigning Web Pages to 
Categories 

In addition to finding hubs, authorities, 
and communities, hyperlinks can be used 
to categorize Web pages. Categorization 
is a process by which a system learns from 
examples to assign documents to a set of 
predefined topic categories such as those 
found in a taxonomy. Hyperlinks contain 
high-quality semantic clues to a page's 
topic; these clues are lost when the links 
are processed by a purely term-based cat- 
egorizer. Exploiting this link information 
is challenging, however, because it is 
highly noisy. Indeed, we have found that 
naive use of terms in a document's link 
neighborhood can degrade accuracy. 

HyperClass 1 embodies one approach to 
this problem, making use of robust statis- 
tical models such as Markov random 
fields (MRFs) together with a relaxation 
labeling technique. HyperClass obtains 



improved categorization accuracy by 
exploiting link information in the neigh- 
borhood around a document. The MRF 
framework applies because pages on the 
same or related topics tend to be linked 
more frequently than those on unrelated 
topics. Even if none of the linked pages' 
categories are known initially, you can 
obtain significant taxonomy improvement 
using relaxation labeling, wherein you iter- 
atively adjust the category labels of the 
linked pages and of the page to be catego- 
rized until you find the most probable con- 
figuration of class labels. In experiments 
performed 1 using preclassified samples 
from Yahoo and the US Patent Database 
(www.ibm.com/patents). HyperClass with 
hyperlinks cut the patent error rate by half 
and the Yahoo documents error rate by 
two thirds. 

HyperClass is also used in a focused 
Web crawler 2 designed to search for pages 



on a particular topic or set of topics only. 
By categorizing pages as it crawls, the 
focused crawler does more than filter out 
irrelevant pages — it also uses the associ- 
ated relevance judgment, as well as a rank 
determined by a version of the Clever 
algorithm, to set the crawling priority of 
the outlinks on the pages it finds. 
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resources at each node could be refreshed on a nightly 
basis following the one-time human effort of describ- 
ing the topics. How, then, should we describe a topic 
node to Clever? 

Most simply, we may take the name or label of the 
node as a query term. More generally, we may wish to 
use the descriptions of other nodes on the path to the 
root. For instance, if the topic headings along a root- 
to-leaf path are Business/Real Estate/Regional/United 
States/Oregon, the query "Oregon" is not accurate; 
we might prefer instead the query "Oregon real 
estate. " 

Additionally, we may provide some exemplary 
authority or hub pages for the topic. For instance, the 
sites www.att.com and www.sprint.com may be exem- 
plary authority pages for the topic "North American 
telecommunications companies." In practice, we envi- 
sion a taxonomy administrator first trying a simple 
text query to Clever. Often this query will yield a good 
collection of resources, but other times Clever may 
return a mix of high-quality and irrelevant pages. In 
such cases, the taxonomy administrator may highlight 
some of the high-quality pages in the Clever results as 
exemplary hubs, exemplary authorities, or both. This 
is akin to the well-studied technique of relevance feed- 
back in information retrieval. 

To take advantage of exemplary pages, we add an 
exemplary hub to the base set, along with all pages 
that it points to. and then increase the weights of the 
links emanating from the exemplary hub in the itera- 
tive computation. We treat exemplary authorities sim- 
ilarly, except that instead of adding to the base set any 
page pointing to an exemplary authority — a heuristic 
found to pull in too many irrelevant pages — we add 
any page pointing to at least two exemplary authori- 
ties. We use a similar heuristic to delete from the base 
set user-designated "stop-sites" and their link neigh- 



borhoods. This is typically necessary because of the 
overwhelming Web presence of certain topics. For 
instance, if our topic is Building and Construction 
Supplies/Doors and Windows, the "Windows" key- 
word makes it difficult to ignore Microsoft. Stop-sit- 
ing www.microsoft.com eliminates this concern. 

Thus, we may envision a topic node being described 
to Clever as a combination of query terms, exempli- 
fied authority and hub pages, and, optionally, stop- 
sites. We have developed a Java- based graphical user 
interface — called "TaxMan." for Taxonomy Man- 
ager — to administer such taxonomy descriptions. 
Using this tool, we have constructed taxonomies with 
more than a thousand topics. We have benchmarked 
both the time spent in creating these taxonomies and 
the resultant quality of using simple text-only queries 
versus a combination of text queries and exemplary 
Web pages. In our study, we found that the average 
time spent per node grows from about seven seconds 
to roughly three minutes when you move to a combi- 
nation of text and exemplary page queries. Outside 
users quantified the increase in quality by reporting 
that— when comparing the pages generated using 
exemplaries to pages generated by textual queries — 
they considered eight percent more of the exemplary 
pages to be good link sources. 

The "Assigning Web Pages to Categories" sidebar 
describes how hyperlinks can be used to establish 
clearer taxonomy categories as well. 

CITATION ANALYSIS 

The mining of Web link structures has intellectual 
antecedents in the study of social networks and cita- 
tion analysis. 5 The field of citation analysis has devel- 
oped several link-based measures of scholarly papers' 
importance, including the impact factor and influence 
weights. 5 These measures in effect identify authorita- 
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tive sources without introducing the notion of 
hubs. The view of hubs and authorities as dual 
sets of important documents is inspired by the 
apparent nature of content creation on the 
Web, and indicates some of the deep contrasts 
between Web and scholarly literature content. 

The methodology of influence weights from 
citation analysis relates to a link-based search 
method developed by Sergey Brin and 
Lawrence Page. 6 They used this method as the 
basis for their Google Web search engine. 
Google first computes a score, called the 
PageRank. for every page indexed. The score 
for each page is the corresponding component of the 
principal eigenvector of a matrix B, which can be 
viewed as the adjacency matrix A with a very small 
constant added to each entry. Given a query, Google 
returns pages containing the query terms, ranked in 
order of these pages' PageRanks. 

The actual implementation of Google incorporates 
several additional heuristics, similar in intent and spirit 
to those used for deriving Clever from HITS. Google 
focuses on authoritative pages, however, while Clever 
seeks both authorities and good hub pages. Some hub 
pages may have few or no links into them, giving them 
low PageRank scores and making it unlikely that 
Google would report them. Several participants in our 
user studies suggested that good hubs are especially 
useful when trying to learn about a new topic, but less 
so when seeking a very specific piece of information. 
Google and Clever also differ in their behavior 
toward topics with a commercial theme. A com- 
pany's Web-page description of itself may use terms 
and language different from these that a user might 
search for. Thus, a direct search for "mainframes" 
in Google would not return IBM's home page, which 
does not contain the term "mainframes." Yet IBM 
would still be pulled in by Clever because of the 
many hub pages that describe IBM as a mainframe 
manufacturer. 

In independent work, Krishna Bharat and Monika 
R. Henzinger 7 have given several other extensions to 
the basic HITS algorithm, substantiating their 
improvements via a user study. For instance, their 
paper was the first to describe the modification in 
which the weights of multiple links from within a site 
are scaled down. 



We believe the mining of Web link topology 
has the potential for beneficial overlap with 
several areas, including the field of infor- 
mation retrieval. 8 Mining well-structured relational 
data offers another possibility. Extracting from an 
unstructured medium such as the Web a structure 
of the kind that succumbs to traditional database 
techniques 9 presents a considerable challenge. 



We hope that the techniques described here repre- 
sent a step toward meeting this challenge. ❖ 
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Virtual Communities: A Brief History 

by 

Kathleen Cronin (kmcronin@acs.ucalgary.ca) 

With the introduction of any new technology there are unintended social effects which occur as a result 
of society's reaction to the new technology. Kiesler states that "the long run social effects of a new 
technology are not the intended ones, but have more to do with the technology's indirect demands on our 
time and attention, and with the way it changes our work habits and our interpersonal relations." (47) . 
The reactions to the initial introduction of the technology will in turn effect the way the technology 
develops. If technology can be adapted to better fit our work habits and interpersonal relations, users 
will demand change by means of their use or dis-use of the technology. In the development of computer 
networks people have used certain features more extensively than others, which has forced programmers 
to further develop these features, though this may not have been their initial intention. In the case of 
computer networks, users have demanded more functions which would allow them to communicate with 
each other (e-mail, conferences) and as a result of these improved functions, virtual communities have 
been able to form. The first community was formed on APRAnet the first computer network, as 
communication became easier due to the development of more sophisticated communication functions. 
In order to look at examples of the formations of early networks and communities, it is necessary to 
discuss what exactly constitutes a community. 

Community becomes a more abstract concept within computer networks due to the lack of real, physical 
boundaries. "The concept of community commonly refers to a set of social relationships that operate 
within specified boundaries or locales, but community has an ideological component as well, in that if 
refers to a sense of common character, identity or interests"(Fernback and Thompson 3). Here the 
defining elements of community are the social interactions rather than boundaries. In virtual 
communities social interaction differs from other types of communities because the interactions are 
computer-mediated; however, "[t]he way in which people use CMC [computer-mediated 
communication] always will be rooted in human needs, not hardware or software"(Rheingold 4). Like 
other communities a virtual community is based on one or more common characteristics or interests of 
its members. In other words, the social relationships determine if the group is a community, and 
boundaries and the method of communication determine the type of community. 

J.C.R. Licklider and Robert Taylor, research directors for the US Department of Defense, started the 
research which lead to the development of ARPAnet; the first multisite, packet switched network. 
ARPAnet was designed to connect with the Advanced Research Projects Agency (ARPA) for the 
transferring of files and resource sharing. It was a simple services network for sharing news and for 
many to many synchronous communication. The two main features were the File Transfer Protocol 
(FTP) and TELNET, a remote login. E-mail, was an afterthought in the development of ARPAnet, but 
quickly became one of the most popular features of the system.(Quarterman 36-38). By 1980 e-mail 
capabilities had developed significantly. Bulletin boards were regularly used and Finger and WHOIS 
programs were developed to help people find e-mail addresses. These improvements of the initial 
communication tools were done due to demand of the users. Once they were sufficiently developed 
enough structure existed allowing users to form a community. The first virtual community was on 
ARPAnet and that was Science Fiction Lovers (SF-LOVERS), started in 1978( Quarte rman 47). At first 
there were attempts to suppress it as it was viewed as a waste of resources; however, this attempt failed 
setting precedent for the development of future communities. (Rheingold 13). 
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The creators of APRAnet did not intend the network to be used for the purpose of developing 
communities. However, Licklider and Taylor did predict that there would be the formation of virtual 
communities. They thought that the communities would consist of members who were not necessarily 
close geographically and the binding feature of these communities would be shared interests among the 
members(Rheingold 13). Though they predicted that there would be the formation of such communities 
they did not expect these communities to develop on APRAnet because the intended function of the 
network was for military research, not social development. 

Around the same time Licklider and Taylor were planning the APRAnet, a New Left was forming in 
California. They promoted progressive ideas and an egalitarian society. One faction of the group 
believed that technological advancement would aid in their struggle for this ideal society. "These 
technophiliacs thought that the convergence of media, computing and telecommunications would 
inevitably create the electronic agora - a virtual place where everyone would be able to express their 
own opinions without fear of censorship"(Barbrook and Cameron 3). This faction of the New Left 
believed that this convergence of technologies would create an ideal platform for free, uncensored 
speech. Licklider and Taylor had predicted that the creation of virtual communities would simply be 
based on shared interests. The New Left took that one step further and hypothesized (or hoped) for 
radical social change. 

Kiesler states that "computer networks may be can-opener technologies, making life a little easier, or 
they may be something more than that-technologies that change organizations"(47). This statement 
mirrors the differing ideas of the predictions of the aforementioned groups. Kiesler discusses this issue 
in 1986 demonstrating that then there was still no clear idea of where this technology would take us. 
Today there is still room for radical social change; however, "[e]xperientially, community within 
cyberspace emphasizes a community of interests, usually bounded by the topic under discussion, that 
can lead to a communal spirit and apparent social bonding"(Fernback and Thompson 5). This is 
generally what virtual communities are based on today. The technology has developed since the days of 
ARPAnet and has become more sophisticated to better serve the needs and desires of users. What further 
social changes will occur are hard to guess. As William Melody states in his article "Electronic 
Networks and Changing Knowledge" : "Attempts to assess the long-term social implications of 
technological change in the information and communication field are made especially difficult because 
of the complex methodological problems associated with network analysis. New information and 
communication networks grow over time as a result of learning, adaptation through changes in personal 
habits, and accompanying changes in institutional relations. "(269). 

References 

Barbrook,Richard and Cameron, Andy. "The Californian Ideology." Hypermedia Research Center of the 
University of Westminister, London. Available:http://www. w min.ac.uk/media/ HRC/ci/calif5. html 



Fernback, Jan and Thompson, Brad. "Virtual Communities:Abort,Retry,Failure?" 
Available:http://www.well.com/user/hlr/texts/VCcivil.html 

Hardy,H.E. "The History of the Net." Design of Information Systems, University of Michigan, School of 
Information and Library Studies. December 1994. 
Available:http://www. eff.org/pub/Net_culture/net. history 

Kiesler, Sara. "Thinking Ahead". Harvard Business Review. January-February 1986:46-59. 

Melody, William. "Electronic Networks and Changing Knowledge". Communication Theory Today. Ed. 



http://www.acs.ucalgary.ca/-dabrent/380/webproj/kathleen.html 



3/17/05 



Virtual Communities: A Brief History 



Page 3 of 3 



David Crowely and David Mitchell. Stanford:Stanford Press, 1994. 254-273. 

Quarterman, John S. 'The Global Matrix of Minds". Global Networks: Computers and International; 
Communication. Ed. Linda M. Harasim.Cambridge:The MIT Press, 1993. 35-56. 

Rheingold, Howard. "A Slice of My Life in My Virtual Community. 1 ' Whole Earth Review, 1992. 
Available:gopher://gopher.well.sf.ca.us:70/00/Community/virtual_communities92 



http://www.acs.ucalgary.ca/-dabrent/380/webproj/kathleen.html 



3/17/05 



Ref 
# 


Hits 


Search Query 


DBs 


Default 
Operator 


Plurals 


Time Stamp 


LI 


1 


(("20030120685") or 
("23300093404")).PN. 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 10:09 


L2 


0 


("23300093404").PN. 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 10:09 


L3 


1 


("20030093404").PN. 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 11:09 


L4 


1 


("6029195").PN. 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 11:10 


L5 


1 


4 and bulletin 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 11:16 


L6 


1 


"5724567".PN. 


USPAT; 
USOCR 


OR 


OFF 


2005/03/17 11:16 


L7 


1 


"5717923".PN. 


USPAT; 
USOCR 


OR .>.;.:;■■' 


OFF 


2005/03/17 11:16 


L8 


1 


"5331554".PN. 


USPAT; 
USOCR 


OR 


OFF 


2005/03/17 11:16 


L9 


0 


("6029195.uref.").PN. 


US-PGPUB; 
USPAT; 
EPO : 


OR 


OFF 


2005/03/17 11:17 


L10 


160 


"6029195".uref. 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 11:17 


Lll 


132 


10 and @ad<"20000825" 


US-PGPUB; 

USPAT; 

EPO 


OR 


off : 


2005/03/17 11:29 


L12 


5 


11 and communities 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 11:29 



Search History 3/17/05 11:30:52 AM Page 1 
C:\APPS\EAST\Workspaces\09.wsp 



Ref 

# 


Hits 


Search Query 


DBs 


Default 
Operator 


Plurals 


Time Stamp 


LI 


0 


recommending near3 communities 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:27 


L2 


1 


recommending same communities 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:28 


L3 


1 


recommending same bulletin near2 
boards 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:29 


L4 


0 


suggesting near3 (bulletin near2 
boards) 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:29 


L5 


0 


suggesting same (bulletin near2 
boards) 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:29 


L6 


8 


automatic near3 communities 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:32 


L7 


2 


(("5884270") or ("5862223")).PN. 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:33 


L8 


3186611 


search$5 or query$4 or surf$6 or 
brows$7 or queries 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:35 


L9 


4485631 


create$4 or creat$5 or form or 
build$5 or construct$5 or start$ or 
add or forming or develop$7 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:36 


L10 


2612781 


invite$4 or ask or request$6 or 
contact$4 or match$4 or e-mail$4 
or message 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:37 


Lll 


49118 


((e or electronic) or (mail or cyber 
or virtual or on-line or online or 
internet or web or www or 
cyberspace) ) near3 (gathering or 
communit$7 or group$4 or club$4 
or forum$4 or (bulletin near2 
board) or (chat) near3 (group$2 or 
room$2)) 


US-PGPUB; 

USPAT; 

EPO: 


OR 


OFF 


2005/03/17 13:42 

■ }; ' -I; ■ - -0- ■ ' 


L12 


25823 


8 and 9 and 10 and 11 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:51 


L13 


2398581 


9 and 10 


US-PGPUB; 

USPAT; 

EPO 


OR 


OFF 


2005/03/17 13:51 



Search History 3/17/05 2:45:13 PM Page 1 
C:\APPS\EAST\Workspaces\09.wsp 



